Simple Excel and ICD-10 based dataset calculator for the Charlson and Elixhauser comorbidity indices

Background The Charlson and Elixhauser Comorbidity Indices are the most widely used comorbidity assessment methods in medical research. Both methods are adapted for use with the International Classification of Diseases, which 10th revision (ICD-10) is used by over a hundred countries in the world. Available Charlson and Elixhauser Comorbidity Index calculating methods are limited to a few applications with command-line user interfaces, all requiring specific programming language skills. This study aims to use Microsoft Excel to develop a non-programming and ICD-10 based dataset calculator for Charlson and Elixhauser Comorbidity Index and to validate its results with R- and SAS-based methods. Methods The Excel-based dataset calculator was developed using the program’s formulae, ICD-10 coding algorithms, and different weights of the Charlson and Elixhauser Comorbidity Index. Real, population-wide, nine-year spanning, index hip fracture data from the Estonian Health Insurance Fund was used for validating the calculator. The Excel-based calculator’s output values and processing speed were compared to R- and SAS-based methods. Results A total of 11,491 hip fracture patients’ comorbidities were used for validating the Excel-based calculator. The Excel-based calculator’s results were consistent, revealing no discrepancies, with R- and SAS-based methods while comparing 192,690 and 353,265 output values of Charlson and Elixhauser Comorbidity Index, respectively. The Excel-based calculator’s processing speed was slower but differing only from a few seconds up to four minutes with datasets including 6250–200,000 patients. Conclusions This study proposes a novel, validated, and non-programming-based method for calculating Charlson and Elixhauser Comorbidity Index scores. As the comorbidity calculations can be conducted in Microsoft Excel’s simple graphical point-and-click interface, the new method lowers the threshold for calculating these two widely used indices. Trial registration retrospectively registered. Supplementary Information The online version contains supplementary material available at 10.1186/s12874-021-01492-7.

, chronic renal failure [12], coronary artery bypass grafting [12], diabetes [12], hip fracture [10,[13][14][15], and stroke [16]. The Charlson Comorbidity Index (CCI) is a weighted score that accounts for the presence of 19 comorbid diseases [1]. CCI was later adapted for use with administrative data based on the International Classification of Diseases [4]. Later, Quan and his colleagues (2011) provided updated weights for CCI, as treatment of some diseases has improved in time [5]. The Elixhauser comorbidity system was initially developed for administrative data when measuring the presence of 30 comorbidities [17]. Van Walraven and his colleagues (2009) later modified the initial classification system into a single weighted score -Elixhauser Comorbidity Index (ECI) [9]. Later studies have provided different weighting schemes for ECI: Thompson and AHRQ (the Agency for Healthcare Research and Quality, Canada) weights [18,19].
Currently, available CCI and ECI calculation methods are limited to a few software applications, which can only be operated through a command-line user interface, requiring R, SAS or SQL programming language skills [20][21][22][23][24][25][26][27]. Other available calculators allow measuring only a single patient's comorbidities at a time [28]. As all CCI and ECI dataset calculators require specific programming-based software, the indices' accessibility is limited for those who use other software for statistical analyses or have no prior programming experience. Thus, the accessibility of CCI and ECI can be increased by developing new methods using more user-friendly interfaces. Microsoft Excel is a widely used spreadsheet programme, and its graphical point-and-click interface allows use without programming. Thus, this study aims to use Microsoft Excel to develop a non-programming and ICD-10 based dataset calculator for CCI and ECI and to validate its results with R-and SAS-based methods.

Data validation
The data validation was based on a logic check or the reviewal of patients' medical information (Fig. 1). Firstly, patients' Nordic Medico-Statistical Committee's Classification of Surgical Procedures (NOMESCO) surgical management was reviewed to confirm their hip fracture diagnosis. Following codes confirmed the diagnosis: NFB20, NFB30, NFB40, NFB99, NFB00-9; NFB10-9, NFJ70-3, NFJ60-3, NFJ80-3, NFJ50-3 [36]. If these codes were not available, a patient's digital images and medical records were reviewed. Two national databases were used to review digital images and medical records: the Foundation of Estonian PACS (an image archiving and communication system database) and the Estonian National Health Information System (https:// ap. digil ugu. ee/ arsti porta al). Uploading medical data to both databases is mandatory by law, particularly since 2010 for medical records and since 2014 for digital images. Digital images were reviewed from January to July 2017 and medical records from January to March 2019. An orthopaedic surgeon and a radiologist reviewed the digital images, and a geriatrician reviewed the medical records.
Hip fracture diagnosis was confirmed if one or both of the data sources approved its presence.

Patients' comorbidities
Comorbidities were defined as diagnoses coded as ICD-10 at any hospital or outpatient health care claims during a four-year period: at the time of the index HF and during the preceding 4 years. The 4 year preceding period was chosen to avoid under-ascertainment of comorbidities [37]. Finally, a restriction was applied to increase the validity of comorbidity assessment: only ICD-10 codes that appeared at least two times, and at least 7 days apart were included [13,38].

Development of excel-based calculator
The Microsoft Excel-based dataset calculator was developed using ICD-10 coding algorithms [4], and different weighting schemes of CCI and ECI, the program's basic formulae and wide format (Additional file 1). The 10th revision of the International Classification of Diseases was chosen as it is used by more than a hundred countries, including Estonia, and cited in more than 20,000 scientific articles world [4,39]. The weighting schemes included the original [1] and the updated [5] CCI weights, and van Walraven [9] and AHRQ weights [18]. The calculator also takes into account the hierarchy of comorbidities: milder disease forms are excluded if a more severe one is present. Excel's basic formulae were used for making the calculator as this makes it simple and flexible for users. It calculates comorbidity scores in two steps. If cell A2 is a patient's ID and B2 contains her/his diseases as ICD-10 codes, the first step identifies the patient's comorbidity categories Wide-format, showing one subject per row, was preferred as this is the most used final data structure in statistical analysis. As ICD-10 data is occasionally in long format -one morbidity per row, simple data transformation solutions are included in the calculator's instructions and in its one-minute instructional video (Additional file 2). Data transformations were done with an Excel's add-in named Ablebits (www. ableb its. com). The add-in's functions 'Merge Duplicates' (transforms long format to wide format), 'Merge Cells' (combines codes from multiple columns into one) and 'Split Text (splits codes from one cell to multiple columns or rows; transforms wide format to long format) are useful for such purposes. The calculator allows ICD-10 codes to be inserted in any format: lowercase, uppercase, with or without punctuation, and any separators can be used between diagnoses. Finally, the calculator's ability to identify all ICD-10 codes used in CCI and ECI was tested since the used hip fracture population may not cover all of the diseases used in the indicies. The calculator identified all ICD-10 codes used in the two indices.

Statistical analysis
Continuous variables were presented as "median (25 th -75 th percentile)" and categorical as proportions. The patients' Charlson weight comorbidity score and the presence of different diseases were calculated using the Excel-based calculator and the R package "comorbidity" [21] and two SAS macros [40,41]. The Excelbased calculator was validated by comparing the three methods' results. The calculators' processing speeds were compared using the study's data (multiplicated for larger sample sizes). Excel-based calculator's processing speed was assessed by running formulae in all columns at once.

Patients and their comorbidities
A total of 11,491 patients were included in the study (Fig. 1). Their median age was 81 years (73-87), 72% (8246) were female, and 51% (5883) had an intracapsular fracture. The Excel-based calculator's results are presented in two tables: the original and the updated weight CCI scores in Table 1; and AHRQ and van Walraven weight ECI scores in Table 2.  calculator's results were consistent, revealing no discrepancies, with the R-and SAS-based methods.

Processing speed
The Excel-based calculator performed well with sample sizes of up to 200,000 patients, showing a processing time from 2 s up to 4 min and 10 s (Fig. 2). However, calculating comorbidities for 400,000 patients took 21 min and 32 s for CCI and 36 min and 34 s for ECI with the used hard-and software. In contrast, the R-and SAS-based calculators performed all calculations in less than 22 s.

Discussion
This study proposes a novel, simple and validated tool for the most used comorbidity indices in medical research -CCI and ECI. The compared methods performed similarly in terms of accuracy, although the new tool has advantages and disadvantages that should be considered. The main advantage of the Excel-based method is its ease of use: comorbidity scores can be calculated by just copying and pasting patients' identification numbers and ICD-10 codes from one spreadsheet to another. This can be done in a simple graphical point-and-click interface, requiring no coding skills from its user. However, the Excel-based calculator has limitations. Other programming-based methods allow calculating CCI and ECI with earlier versions or adaptions of the International Classification of Diseases: ICD-9, ICD-9-CM, Enhanced ICD-9-CM codes, or ICD-10-CM [5,[20][21][22][23][24][25][26][27]. The new calculator's processing speed is reasonable with datasets of up to 200,000 patients and relatively capable hardware, taking up to few minutes in total. Still, it may take a considerable amount of time with larger data. This is explained by the calculator's formulae-based nature, as they are duplicated in millions of spreadsheet cells, requiring a considerable amount of computing power. Computers with better hardware specifications (especially central processing unit's [CPU] speed, random-access memory [RAM]) and 64bit version Microsoft Excel are therefore recommended for analysing large data. On the other hand, most medical research studies examine smaller sample sizes, large data splitting is always an option, and an hour-long calculation still takes significantly less time than learning to code. Another limitation is that Excel's spreadsheets are limited to slightly over a million rows. Therefore, large long-format data may require splitting and should be prepared using multiple sheets. Ultimately, the final choice between using the Excelbased and other methods depends on a user's skills, preference, available software and needs. All these factors vary among researchers. The new calculator may be useful for users preferring Microsoft Excel to prepare or analyse data, or those who have no programming skills, or whose used statistical software does not have a module for calculating CCI or ECI. Thus, the new simple Excel-based method lowers the threshold for calculating CCI and ECI, making these indices accessible to a broader audience.

Conclusions
This study proposes a novel, validated, non-programming based method for calculating two of the most used comorbidity indices in medical research -CCI and ECI. The Excel-based calculator allows calculating these comorbidity indices by simply copying and pasting data in a graphical point-and-click interface, thereby lowering the threshold for calculating CCI and ECI. The method may be useful for users preferring Microsoft Excel to prepare or analyse data, or those who have no programming skills, or whose statistical software does not have a corresponding module. The calculator's slower processing speed is a downside that should be taken into account with very large datasets or less capable hardware or 32bit version of Microsoft Excel.